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Abstract — Super-resolution  algorithms  produce  a  single  high- 
resolution  image  from  a  set  of  several,  low-resolution  images 
of  the  desired  scene.  The  low-resolution  frames  are  shifted 
differently  with  respect  to  the  high  resolution  frame  with  subpixel 
increments.  This  paper  presents  first  a  theoretical  overview 
of  super-resolution  algorithms.  The  most  important  methods, 
namely,  the  iterative  back-projection,  projection  onto  convex  sets, 
and  maximum  a  posteriori  estimation  are  then  compared  within 
the  same  framework  of  implementation. 

I.  Introduction 

In  many  applications  of  remote  sensing,  images  of  high 
resolution  (HR)  are  required.  Super-resolution  (SR)  achieves 
this  by  combining  several  low-resolution  images  of  the  desired 
scene.  These  LR  images  can  be  acquired  either  as  a  sequence 
over  time  by  the  same  sensor,  or  taken  at  the  same  time  with 
different  sensors.  A  single  HR  image  can  then  be  constructed 
only  if  the  LR  frames  are  shifted  with  respect  to  the  HR  grid 
differently  from  each  other  and  with  sub-pixel  increments. 
Thus  each  LR  frame  contains  unique  information  which  cannot 
be  obtained  from  the  other  LR  frames  and  this  is  exploited  to 
obtain  a  HR  image. 

II.  Problem  Statement 

Consider  a  high-resolution  image  of  size  x 

L2M2  written  as  a  lexicographically  ordered  vector  z  = 
[z\,  Z2,  ■  ■  ■ ,  Zn}1  ,  where  N  =  L1M1L2M2.  Note  that  z  is 
considered  to  be  generated  by  sampling  a  continuous  scene. 
Let  us  also  define  N±  =  L\M\  and  N2  =  L2M2. 

If  Li  and  L2  represent  the  downsampling  factors  in  the 
x  and  y  directions  respectively,  then  each  observed  low- 
resolution  frame  is  of  size  M\  x  M2.  Let  P  be  the  number 
of  the  available  LR  frames.  In  a  similar  manner,  the  A;th 
LR  frame  may  be  expressed  in  lexicographical  notation  as 
y k  =  [jffei,  yk2,  •  •  • ,  ykM]T,  for  k  =  1, 2, . . . ,  P,  where  M  = 
MiM2.  The  complete  set  of  the  LR  frames  is  denoted  by 

y=  [yf  ,yZ ,  ■  ■  ■  ,yp\T  =  [yi,y2,---,ypM]T. 

First,  an  observation  model  relating  the  LR  frames  to  the 
HR  image  should  be  formulated.  The  observed  LR  frames 
are  assumed  to  have  been  produced  by  geometric  warping, 
blurring,  and  uniform  downsampling  performed  on  the  HR 
image  z.  Moreover,  each  LR  frame  is  typically  corrupted 
by  additive  Gaussian  noise  which  is  uncorrelated  between 
different  LR  frames.  Thus,  the  A’th  LR  frame  may  be  written 
as 

yk  =  DBtTfcZ  +  nfe,  for  k  =  1, 2, . . . ,  P  (1) 


where  T;,.  represents  a  warping  matrix  of  size  N  x  N  which 
models  the  motion  that  occurs  during  image  acquisition,  B;,. 
is  an  N  x  N  blurring  matrix  which  can  be  either  linear 
space  invariant  or  linear  space  variant,  and  is  caused  by 
optical  system  problems  (such  as  imaging  systems  out  of 
focus,  operating  beyond  the  diffraction  limit,  suffering  from 
aberration,  etc.),  relative  motion  between  the  imaging  system 
and  the  imaged  scene,  and  the  point  spread  function  (PSF)  of 
the  LR  sensor,  D  is  an  M  x  N  downsampling  matrix  which  is 
employed  to  generate  aliased  LR  frames  from  the  warped  and 
blurred  HR  image,  and  ri/.  denotes  a  lexicographically  ordered 
M-dimensional  noise  held. 

The  observation  model  may  be  simplified  as  follows: 

Yk  =  WfcZ  +  n*;,  for  k=  1,2,  ...,P  (2) 

where  W /.  =  DB/  T /  is  an  M  x  N  matrix. 

Note  that  the  above-mentioned  imaging  model  assumes  that 
the  HR  image  z  remains  constant  during  the  acquisition  of  the 
LR  frames,  except  for  any  motion  described  by  the  warping 
matrix  T/,.  Therefore,  Wkmr  are  functions  of  the  motion 
parameters  of  each  LR  pixel  relative  to  the  fixed  HR  grid. 
In  consequence,  (2)  may  be  rewritten  as 

N 

ykm  =  ^  '  tlJkrnr  (s/;: )  Zr  ~t“  tlkm  (3) 

r=  1 

where  ykm  is  the  value  of  the  ?nth  pixel  of  the  A’th  LR 
frame,  and  vector  =  [sfci,  Sfc2,  ■  ■  ■ ,  skQ]T  encompasses 
the  Q  registration  parameters  for  the  A;th  LR  frame  with 
the  coordinate  system  of  the  HR  image.  In  practice,  the 
motion  parameters  are  not  often  known  a  priori  and  should 
be  estimated  along  with  the  HR  image  z. 

III.  Super-resolution  methods 

SR  reconstruction  techniques  can  be  employed  in  the  spatial 
or  frequency  domain.  Spatial  domain  algorithms  allow  more 
flexibility  in  incorporating  a  priori  constraints,  noise  models, 
and  spatially  varying  degradation  models.  We  compare  here  3 
spatial  domain  methods. 

A  typical  SR  image  reconstruction  algorithm  consists  of 
three  stages,  namely,  registration,  interpolation  and  restoration. 
These  steps  can  be  performed  independently  or  simultaneously 
depending  on  the  approach  taken.  Registration  refers  to  the  es¬ 
timation  of  the  relative  shifts  of  each  LR  frame  with  respect  to 
a  reference  LR  image  with  subpixel  accuracy.  Since  the  shifts 
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between  LR  images  are  arbitrary,  non-uniform  interpolation 
is  required  to  obtain  a  uniformly  spaced  HR  image.  Finally, 
image  restoration  is  applied  to  the  upsampled  image  to  remove 
blurring  and  noise. 

A.  Iterative  Backprojection  (IBP) 

Given  an  estimate  of  the  HR  image  and  a  model  of  the 
imaging  process,  a  set  of  simulated  LR  frames  is  generated. 
These  simulated  LR  frames  are  then  compared  with  the 
observed  LR  frames.  The  difference  (error)  between  the  two 
sets  is  used  to  correct  the  estimate  of  the  HR  image.  The 
process  is  repeated  iteratively  until  some  stopping  criterion  is 
met,  such  as  the  minimization  of  the  energy  of  the  error,  or 
until  the  maximum  number  of  allowed  iterations  is  reached. 
This  method  was  formulated  by  Irani  and  Peleg  [1],  based  on 
the  ideas  presented  in  [2]  and  (3]. 

Note  that  the  LR  frames  are  registered  with  each  other 
to  subpixel  accuracy  and  an  HR  grid  is  defined  in  a  certain 
relative  shift  from  all  LR  frames.  The  shifts  are  not  updated 
during  the  subsequent  process  which  is  aimed  at  defining 
values  for  the  pixels  of  the  HR  grid. 

The  method  can  be  used  to  incorporate  constraints,  such  as 
smoothness  or  any  other  additional  constraint  which  represents 
a  desired  property  of  the  solution.  The  method  employs  a 
back-projection  operator  Hpp.  This  can  potentially  affect 
the  solution  to  which  the  iterative  algorithm  converges.  The 
main  practical  weakness  of  the  method  lies  exactly  in  the 
choice  of  Hpp,  especially  when  a  priori  constraints  should  be 
included.  In  addition,  only  linear  constraints  can  be  considered. 
Thus,  although  the  IBP  based  methods  provide  a  mechanism 
for  constraining  the  SR  reconstruction  to  conform  with  the 
observation  data,  they  cannot  resolve  the  problem  of  non- 
uniqueness.  It  can  be  proven  that  in  the  common  case  of  a 
real  and  symmetric  point  spread  function  (PSF)  H,  a  possible 
choice  of  H/!/>  is  Hpp  =  H  [2].  Irani  and  Peleg  extend  their 
work  in  [4]  by  considering  a  more  general  motion  model. 


B.  Projection  Onto  Convex  Sets  (POCS) 

The  most  prominent  feature  of  the  POCS  formulation  is 
the  ease  with  which  prior  knowledge  about  the  solution  can 
be  incorporated  into  the  reconstruction  process.  However,  for 
complex  constraints  the  projection  operators  may  be  difficult 
to  compute. 

Consider  the  following  constraints  which  are  imposed  by 
the  complete  set  of  the  observed  LR  frames: 

Cmim2k  =  { z{n1,n2)\r \  ;(toi,to2)  <  <50}  (4) 


1  <  to i  <  Mi,  1  <  m2  <  M2,  k  =  1, 2, . . . ,  P 


where 


JVj  n2 

r[z\mi,m2)  =  yk(mi,m2)-  E  E  z(ni,n2)h(mi,  m2;  ru,  n2) 

n\= 1 n2=l 

(5) 

z(ni,n2)  denotes  the  (rii,  n2)th  HR  pixel,  yk(mi,m2)  the 
(toi,  TO2)th  pixel  of  the  fcth  LR  frame,  h  the  PSF  function  and 
<5q  represents  the  uncertainty  that  we  have  in  the  observation. 


(  z) 

Therefore,  the  residual  (m.i,m2),  which  is  actually  the 
observation  noise  ri/( ,  should  be  below  the  uncertainty  level  <j0 
which  can  be  set  according  to  the  noise  statistics.  So,  Cmim2k 
is  the  set  of  possible  solutions  that  make  the  value  of  pixel 
(toi,to2)  in  the  fcth  LR  frame  to  differ  by  less  than  do  from 
the  observed  value. 

These  constraints,  which  are  also  known  as  data  consistency 
constraints,  are  closed  and  convex,  and  define  a  feasible 
solution  z  in  the  field  of  HR  images  for  which  the  sensor 
outputs  are  the  observed  LR  frames.  Each  constraint  is  defined 
for  a  single  LR  pixel,  and  thus  there  are  M  =  M\M2  such 
constraints  for  a  single  LR  frame,  and  PM  constraints  for  the 
complete  LR  dataset. 

According  to  [5],  the  projection  of  an  arbitrary  image 
z{n\,  n2)  onto  Cmim2k  is  given  in  (6).  If  we  enumerate  pixels 
(mi,  m2)  by  a  single  index,  the  above  projection  may  be 
denoted  as  P,;z  where  i  takes  values  1,2,...,  PM,  with  PM 
being  the  total  number  of  pixels  in  all  LR  available  frames. 
Then  these  projection  operators  have  to  be  applied  in  turn 
for  all  LR  pixels.  According  to  the  fundamental  theorem  of 
POCS,  having  PM  data  consistency  constraints  Cmirn2k  for 
the  complete  set  of  LR  images,  the  sequence 

z(n+t)  =  VPMVPM-l  . . .  P2Piz<">  (7) 

converges  to  the  desired  HR  image  z  for  an  initial  estimate 
z'{)) .  Note  that  the  reached  solution  is  in  general  non-unique 
and  depends  on  the  initial  guess.  However,  additional  con¬ 
straints  may  be  imposed  from  prior  knowledge  to  favor  a 
particular  HR  image.  Moreover,  they  enable  robust  perfor¬ 
mance  in  the  presence  of  inconsistent  or  missing  data.  These 
constraints  may  be  amplitude  constraints,  energy  constraints, 
a  reference  image  constraint,  or  a  bounded  support  constraint. 
The  projection  operators  of  these  constraints  are  relatively  easy 
to  formulate.  More  constraints  may  be  defined  in  a  similar 
manner,  and  incorporated  in  the  iterative  sequence  of  (7). 

As  in  the  IBP  method,  the  registration  parameters  of  all  LR 
frames  with  respect  to  the  HR  grid  and  with  respect  to  each 
other  are  computed  once  and  assumed  known  throughout  the 
whole  process.  The  PSF  which  relates  the  HR  pixels  with  the 
LR  pixels  is  also  assumed  to  be  known.  Thus  the  purpose  of 
POCS  is  to  assign  values  to  the  pixels  of  the  HR  grid. 

C.  Probabilistic  Methods 

According  to  this  family  of  methods,  the  HR  image  z 
and  the  registration  parameters  s  are  considered  as  random 
fields  described  by  the  joint  prior  probability  density  function 
P(z,s).  The  steps  of  a  typical  maximum  a  posteriori  (MAP) 
reconstruction  method  are  described  below.  The  task  is  to  form 
a  joint  MAP  estimate  of  z  and  s  given  the  observed  LR  frames 
y.  Then  the  estimates  can  be  computed  as 

(z,  s)  =  arg  max  P( z,  s  |  y)  (8) 

Z,S 

where  P( z,s  |  y)  is  the  joint  posterior  probability  of  z  and  s, 
conditioned  on  the  observed  y. 


P mim2k  [x(Vt;[  7  ^2 ) ] 


(6) 


z(nl,n2)  H 
=  <  z(m,n2), 

K  z{ni,n2)  H 


riZ\‘m1,m2)-50 

E  !x  E,3  h2(m1,m2-,liM) 

r^)(mi,m2)+^0 
Eij  E,3  h2(m1,m2-,l1,l2) 


(  z') 

h(mi,m2\ni,n2),  ifr^  (rai,m2)  >  60 
if  -<j0  <  Jj.  (mi,m2)  <  <50 
h(m1,m2-,ni,n2),  if  >(m1,m2)  <  -50 


Applying  Bayes  theorem,  (8)  may  be  rewritten  as 


(z,  s)  =  argmax 

Z,S 


p(  y  1  z,s)P(z,s) 
P(y) 


(9) 


Since  z  and  s  are  statistically  independent,  P(z,s)  = 
P(z)P(s).  Moreover,  P(y)  is  not  a  function  of  z  or  s,  so 
it  may  be  omitted  from  the  optimization  process  with  respect 
to  z  and  s.  Therefore,  the  estimates  are  given  by 


(z,s)  =  argmax  P(y  |  z,s)P(z)P(s)  (10) 

Z,S 

Equivalently  one  aims  to  minimize  the  negative  logarithm 
of  the  function  in  (10).  Thus 


(z,s)  =  argmin  {-log[P(y  |  z, s)]  - log[P(z)]  - log[P(s)]} 

Z.S 

(ID 

Now  we  should  determine  the  prior  image  probability  den¬ 
sity  function  (pdf)  P(z),  the  prior  motion  probability  density 
function  P(s),  and  the  conditional  probability  density  function 

P{ y|z,s). 

The  prior  image  pdf  which  preserves  convexity,  may  be 
expressed  as 

P 0)  =  -j  exp  {— A/(z)}  (12) 

where  A  is  some  normalizing  constant  which  ensures  that  P(z) 
is  a  probability,  A  is  some  positive  constant,  and  /( z)  is  a 
function  of  the  values  of  the  HR  image  z.  This  function  may 
be  chosen  to  encourage  neighboring  pixels  to  have  similar 
values  so  that  the  first  derivative  of  the  HR  image  function  is 
continuous  (the  so  called  “membrane  model”),  or  so  that  the 
second  derivative  of  the  data  is  continuous  (the  so  called  “thin 
plate  model”). 

The  prior  model  for  the  registration  parameters  s  is  highly 
application  specific.  In  general,  P(s)  may  be  dropped  from 
the  cost  function,  implying  that  we  do  not  have  any  prior 
knowledge  about  the  registration  parameters. 

The  elements  of  n  in  (2)  are  assumed  to  be  independent  and 
identically  distributed  (iid)  Gaussian  samples  with  variance 
<r^.  Therefore,  the  multivariate  pdf  of  n  is  given  by: 

1  f  i  PM  ) 

p(n)=(2ir)^<,™ex,>r^£"‘/  <i3) 

Given  the  observation  model  in  (2)  and  the  noise  pdf  in 
(13),  the  conditional  pdf  P(y|z,s)  may  be  written  as 

p( y  I  z-s)  =  A  exp|--]T(y  ^  Wsz)T(y- Wsz) 

(27t)t  (jN  2<J„ 

(14) 

The  aim  is  to  minimize  the  cost  function  (11)  with  respect 
to  z  and  s.  Note  that  (11)  is  not  readily  differentiable  with 


respect  to  s  for  many  motion  models.  In  addition,  depending 
on  the  choice  of  /( z),  (11)  usually  is  a  quadratic  function  in 
z  and  can  be  minimized  with  respect  to  z  if  s  is  fixed.  The 
authors  in  [6]  apply  a  cyclic  coordinate-descent  optimization 
approach  in  which  z  is  fixed  and  s  estimated,  and  then  s  is 
fixed  and  z  is  estimated.  The  iterative  algorithm  terminates 
when  it  adequately  converges  or  a  pre-set  number  of  iterations 
is  reached.  The  initial  estimate  z1  may  be  an  interpolation  of 
the  first  LR  image. 

The  estimated  LR  images  are  compared  with  the  HR  esti¬ 
mate  zn  after  it  is  projected  through  the  observation  model. 
The  estimation  of  the  registration  parameters  sj?  effectively 
results  in  the  estimation  of  the  weights  w^m  m  n  n2  f°r  the 
nth  iteration.  Once  we  have  updated  all  sj),  we  aim  in  updating 
the  HR  image  z. 


IV.  Experiments 

A  simulated  sequence  of  random  translational  shifts  is  used 
to  generate  a  series  of  translated  LR  images  (each  of  size 
50  x  50  pixels)  from  the  16  bit  grayscale  image  shown  in  Fig 
1(a).  The  camera  is  assumed  to  move  so  that  all  LR  frames 
it  captures  share  the  same  image  plane.  In  other  words,  no 
zooming,  paning,  or  tilting  of  the  camera  motion  is  allowed.  To 
simulate  in  practice  such  data,  from  each  location  we  capture 
a  HR  frame  which  we  blur  with  PSF 
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and  subsample  by  a  factor  of  4  in  each  direction.  No  noise  is 
added  to  each  LR  frame.  The  first  LR  frame  in  the  generated 
sequence  is  presented  in  Fig.  1(b). 

The  first  LR  frame  is  considered  as  the  reference  LR 
frame.  All  LR  frames  are  bilinearly  interpolated  to  the  desired 
high  resolution.  The  interpolated  frames  are  now  registered 
to  the  SR  grid,  which  has  been  set  up  by  the  interpolated 
version  of  the  LR  reference  frame,  using  normalized  cross- 
correlation.  Thus  we  can  accurately  identify  the  translational 
displacements. 

The  initial  HR  image  estimate  is  the  bilinearly  interpolated 
version  of  the  first  LR  frame  as  shown  in  Fig.  1(c).  For 
each  of  the  IBP,  POCS,  and  MAP  estimation  algorithms, 
20  iterations  are  performed.  The  estimated  SR  images  are 
presented  respectively  in  Fig.  1(d)- 1(f). 

Fig.  2(a)  depicts  the  mean  absolute  error  (MAE)  per  pixel 
for  each  method  versus  the  number  of  iterations  used  to 
estimate  the  SR  image.  POCS  and  MAP  estimation  are  shown 
to  be  consistently  more  effective  in  SR  reconstruction  than 
IBP.  For  comparison,  the  MAE’s  of  the  image  formed  by 


(d)  (e)  (f) 

Fig.  1.  (a)  Original  image,  (b)  Simulated  low-resolution  frame  1  (L i  = 

L2  =  4).  (c)  Bilinear  interpolation  of  frame  1.  (d)  IBP:  Super-resolution 
image,  (e)  POCS:  Super-resolution  image,  (f)  MAP:  Super-resolution  image. 


(a)  (b) 

Fig.  2.  (a)  Mean  absolute  error  per  pixel  vs  number  of  iterations  using  16 

frames,  (b)  Mean  absolute  error  per  pixel  vs  number  of  frames  used  for  10 
iterations. 


bilinear  interpolation  of  a  single  LR  frame  is  also  shown.  Fig. 
2(b)  shows  the  MAE  per  pixel  for  all  three  methods  using 
various  numbers  of  LR  frames.  We  can  see  that  with  only 
four  frames,  the  performance  is  comparable  with  that  of  the 
bicubic  interpolator.  Increasing  the  number  of  the  available  LR 
frames  results  in  improved  restoration.  Note  that  16  frames  is 
the  minimum  theoretical  limit  in  order  to  solve  the  SR  problem 
with  our  data.  The  improvement  is  dramatic  until  this  limit 
is  reached  when  using  POCS  and  MAP  estimation  methods. 
Additional  frames  improve  only  slightly  the  SR  image. 

The  computational  cost  of  the  algorithms  is  rather  high  for 
real  time  implementation.  For  a  reconstruction  of  an  image 
with  200  x  200  pixels  from  16  frames  of  50  x  50  pixels  each, 
approximately  10  iterations  are  needed.  The  Matlab  code  used 
required  3.85s,  4.61s  and  6.67s  per  iteration  for  the  POCS, 
MAP,  and  IBP  algorithms  respectively. 

Next  we  examine  the  tolerance  to  registration  errors.  We 
compute  the  HR  image  using  each  of  the  three  methods 
for  various  levels  of  misregistration  (see  Fig.  3).  Of  the 
three  methods  compared,  the  IBP  approach  showed  the  most 
resilience  to  misregistration  errors.  POCS  and  MAP  are  quite 
sensitive  to  registration  errors,  with  POCS  being  a  little  more 


sensitive  than  MAP.  However,  both  techniques  are  clearly 
superior  compared  with  IBP  when  registration  is  successful. 


Fig.  3.  Mean  absolute  error  per  pixel  vs  mean  registration  error  in  units  of 
HR  pixels. 


In  practice,  POCS  and  MAP  are  comparable  in  terms  of 
speed  and  performance.  They  result  in  sufficient  reconstruction 
of  the  desired  HR  image  with  a  small  number  of  iterations. 
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